Pre Work

Imports, set sns style etc

Function and Class Declarations

1. Import Data, Initial Analysis

2. Univariate Analysis

Categorical Columns

3. Bivariate Analysis

4. Modelling

4a. Scaling Data - this is important for clustering algorithms as otherwise columns with larger scales will dominate the clustering (since clustering is a distance based algorithm)

4b. K Means clustering

Distortions and elbow plot

Examining with silhouette scores as below

Silhouette visualiser for k=3 to 6 to see what the cluster silhouette scores look like

Plotting 2d representation of cluster spreads

Examining Clusters

Cluster Characteristics:

4c. Hierarchical Clustering

First we check for best cophenetic coeff amongst the following methods and metrics

Now we check cophenetic for median, ward and centroid which only work with metric = euclidean

Evaluating various dendogram structures for euclidean with different linkage methods

Checking that our custom class works same as the parent sklearn.cluster.AgglomerativeClustering

Silhouette scores for different cluster numbers

Distortions and elbow plot

Cluster representation in 2d

Examining clusters

Comparing Hierarchical and Kmeans Clusters

Cluster Characteristics:

Cluster Specific Recommendations

Key Questions

  1. How many different segments of customers are there?
    • There are 3 segments
  2. How are these segments different from each other?
    • The segments differ considerably both in their characteristics (avg credit limit and number of cards held as well as in their engagement behaviour - in person visits, online, phone calls)
    • Please see section above for description of key differences
  3. What are your recommendations to the bank on how to better market to and service these customers?
    • Please see section above for suggestions

Appendix

Appendix 1: Alternate Method of Hierarchical Clustering

Comparing clusters with Agglomerative and this method

Appendix 2: Clusters with Different linkages